
1 SERVICE CONTROL MANAGER TOOL EXECUTION 

2 Technical Field 

3 The present invention relates to system administration management, and, in particular, to service 

4 control manager modules. 

5 Background 

6 Computer systems are increasingly becoming commonplace in homes and businesses 



7 throughout the world. As the number of computer systems increases, more and more computer 

8 systems are becoming interconnected via networks. These networks include local area networks 

9 (LANs). LANs also frequently have an interface to other networks, such as the Internet, and this 
10 interface needs to be monitored and controlled by network management on the LAN. 

T| 1 One concern encountered with networks is referred to as network management. Network 

ff2 management refers to monitoring and controlling of the network devices and includes the ability for an 

fl3 individual, typically referred to as an administrative user, to be able to access, monitor, and control the 

Cflk devices that are part of the network, or access, monitor, and control the devices that are part of the 

R 

e 1 5 network coupled to other computer systems. Such access, monitoring, and control often include the 

="J6 ability to check the operating status of devices, receive error information for devices, change 

[%7 configuration values, and perform other management functions. As the size of networks increases, so 

EJ8 too does the need for network management. 

19 The operating system of most computers provides an administration tool or a system 

20 administration manager for invoking and performing system management tasks. The hardware of a 

2 1 computer system, the various facilities included within the operating system, such as the file system 

22 facility, the print spooling facility, and the networking facility, as well as the operating system itself must 

23 all be managed. This means that computer systems require some involvement by a human user or a 

24 manager of the computer system for such operations as specifying certain configuration parameters, 
2 5 monitoring ongoing activity, or troubleshooting some problem that has arisen. These management or 
26 administration tasks can be performed manually in many operating systems via direct manipulation of 
2 7 configuration files or direct invocation of specific administration utility programs. But in large operating 
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1 systems involving distributed systems, a more efficient method for managing and monitoring tasks may 

2 be needed, especially in the context of tool execution. 

3 Summary 

4 A service control manager (SCM) tool execution mechanism may enable SCM users to 

5 execute the SCM tools across a set of defined distributed nodes (systems) by providing a secure 

6 mechanism, referred to as a distributed task facility (DTF), to integrate different operations, such as 

7 commands or scripts, and execute the operations across a set of distributed nodes. 

8 The SCM tool execution method may include receiving a request, which includes task 



9 information, from a user through a client to run a tool on one or more nodes, retrieving tool definition, 

1 0 node definition and user definition from a domain manager, and validating the task information received 

Iri 1 from the user. A runnable tool may be created based on the task information and the tool definition, 

2 and the SCM module may check user authorization to run the tool on all of the nodes requested, i.e., 

y 3 whether the user is assigned the roles associated with the tool on all of the nodes. The client may next 

in 

CFl 4 pass the runnable tool to a DTF, which may then issue a task identifier based on the runnable tool, and 

s 1 5 pass the runnable tool to agents associated with the nodes to execute the tool. Finally, the DTF may 

I?J 6 collect task results or failure reports from the agents, and return the task results to the client and then 

?3i 

iS17 to the user. 

U 

Q 8 Description of the Drawings 



1 9 The detailed description refers to the following drawings, in which like numbers refer to like 

20 elements, and in which: 

2 1 Figure 1 illustrates a computer network system with which the present invention may be used; 

22 Figure 2 illustrates the relationships between the user, role, node, tool and authorization objects; 

23 Figure 3 illustrates the relationships between clients, a DTF and agents running on the nodes; 

24 and 

25 Figure 4 is a flow chart of a method for executing tools in the SCM module. 

26 Detailed Description 

27 A service control manager (SCM) module multiplies system administration effectiveness by 
2 8 distributing the effects of existing tools efficiently across managed servers. The phrase "service control 
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1 manager" is intended as a label only, and different labels can be used to describe modules or other 

2 entities having the same or similar functions. 

3 In the SCM domain, the managed servers (systems) are referred to as "managed nodes" or 

4 simply as "nodes". SCM node groups are collections of nodes in the SCM module. They may have 

5 overlapping memberships, such that a single node may be a member of more than one group. The 

6 grouping mechanism may allow flexible partitioning of the SCM module so that users may use it to 

7 reflect the way nodes are already grouped in their environment. 

8 Figure 1 illustrates a computer network system with which the present invention may be used. 

9 The network system includes an SCM 1 1 0 running on a Central Management Server (CMS) 1 00 and 
10 one or more nodes 130 or node groups 132 managed by the SCM 110. The one or more nodes 130 

J|l and node groups 132 make up an SCM cluster 140. See ServiceControl Manager Technical 

||2 Reference, HP® part number: B8339-900 1 9 , available from Hewlett-Packard Company, Palo Alto, 

j*jjjjj3 CA., which is hereby incorporated by reference and which is also accessible at 

M4 <http://www.software.hp.com/products/scm gr> , for a more detailed description of the SCM 1 10. 

ru 

? 1 5 The CMS 1 00 can be implemented with, for example, an HP-UX 1 1 .x server running the SCM 

\%6 110 software. The CMS 100 includes a memory 1 02, a secondary storage device (not shown), a 

;![ 7 processor 1 08, an input device (not shown), a display device (not shown), and an output device (not 

1.3 

CI 8 shown). The memory 1 02 may include computer readable media, RAM or similar types of memory, 

1 9 and it may store one or more applications for execution by processor 108, including the SCM 110 

20 software. The secondary storage device may include computer readable media, a hard disk drive, 

2 1 floppy disk drive, CD-ROM drive, or other types of non-volatile data storage. The processor 1 08 

22 executes the SCM software and other application(s), which are stored in memory or secondary 

23 storage, or received from the Internet or other network 1 1 6. The input device may include any device 

24 for entering data into the CMS 1 00, such as a keyboard, key pad, cursor-control device, touch-screen 

25 (possibly with a stylus), or microphone. The display device may include any type of device for 

26 presenting a visual image, such as, for example, a computer monitor, flat-screen display, or display 

27 panel. The output device may include any type of device for presenting data in hard copy format, such 

28 as a printer, and other types of output devices include speakers or any device for providing data in 
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1 audio form. The CMS 1 00 can possibly include multiple input devices, output devices, and display 

2 devices. 

3 The CMS 100 itself may be required to be a managed node, so that multi-system aware 

4 (MSA) (described later) tools may be invoked on the CMS. All other nodes 1 30 may need to be 

5 explicitly added to the SCM clusterl40. 

6 Generally, the SCM 1 1 0 supports managing a single SCM cluster 1 40 from a single CMS 1 00. 

7 All tasks performed on the S CM cluster 1 40 are initiated on the CMS 1 00 either directly or remotely, 

8 for example, by reaching the CMS 1 00 via a web connection 114. Therefore, the workstation 1 20 at 

9 which a user sits only needs a web connection 114 over a network 116, such as the Internet or other 

1 0 type of computer network, to the CMS 1 00 in order to perform tasks on the SCM cluster 1 40. The 

t ^ 

3 1 CMS 1 00 preferably also includes a centralized data repository 1 04 for the SCM cluster 1 40, a web 

if r* 

ft 2 server 1 1 2 that allows web access to the S CM 1 1 0 and a depot 1 06 that includes products used in the 

H 3 configuring of nodes 1 3 0. A user interface may only run on the CMS 1 00, and no other node 1 3 0 in 

In 

Wl 4 the SCM module may execute remote tasks, access the repository 1 04, or any other SCM operations. 

ru 

s 15 Although the CMS 100 is depicted with various components, one skilled in the art will 

l3 6 appreciate that this server can contain additional or different components. In addition, although aspects 

7 of an implementation consistent with the present invention are described as being stored in memory, one 

E l 8 skilled in the art will appreciated that these aspects can also be stored on or read from other types of 

1 9 computer program products or computer-readable media, such as secondary storage devices, including 

20 hard disks, floppy disks, or CD-ROM; a carrier wave from the Internet or other network; or other 

2 1 forms of RAM or ROM. The computer-readable media may include instructions for controlling the 

22 CMS 1 00 to perform a particular method. 

23 A central part of the SCM module 1 1 0 is the ability to execute various management commands 

24 or applications on the one or more nodes simultaneously. The commands or applications may need to 

25 be encapsulated with an SCM tool, which is typically used to copy files and/or execute commands on 

26 the target nodes 130. The SCM tool may run simple commands such as bdf(l) or mount (1M), launch 

27 single system interactive applications such as System Administration Manager (SAM) or Glance, launch 
2 8 multi-system aware applications such as Ignite/UX or Software Distributor (SD), or perform other 
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1 functions. The tool may be defined using either an SCM tool definition language through command line 

2 interface (CLI) or an SCM-provided graphical user interface (GUI). 



4 (MSA) tools. SSA tools may run on a node 130 and may only affect the operation of that node 1 30. 

5 To run SSA tools on multiple target nodes 130, the SCM module 1 1 0 may execute the tools on each 

6 target node 1 30. In addition to executing commands or launching applications, SSA tools may copy 

7 files from the CMS 1 00 to the target nodes 130. Files may only be copied from the CMS 1 00 to the 

8 managed nodes 130 in this exemplary embodiment, not from the nodes 130 back to the CMS 100. 

9 MSA tools may run on a single node 1 3 0 but may be able to operate on multiple other nodes 
10 130. MSA tools are applications that execute on a single node but can detect and contact other nodes 

3 1 to accomplish their work and this contact is out of the control of the SCM module 110. This type of 

ijj 2 application may need to have a list of nodes 130 passed as an argument at runtime. A node 1 30 where 

1 -i 3 the application will execute may need to be specified at tool creation time, not at runtime. The target 

ip] 

tf\ 4 nodes 1 30 selected by the user may be passed to an MSA tool via a target environment variable that 

fU 

s 1 5 contains a target node list for the MSA tools. MSA tools may not copy files to either the manager node 

hi 6 1 00 or to the target nodes 1 30 in this exemplary embodiment. Therefore, an execution command string 

IS 7 may be required for MSA tools. 

SSl 8 An SCM user may be a user that is known to the SCM module 1 1 0 and has some privileges 

1 9 and/or management roles. An SCM role, which is an expression of intent and a collection of tools for 

20 accomplishing that intent, typically defines what the user is able to do on the associated nodes 1 30 or 

2 1 node groups 1 32, e.g., whether a user may run a tool on a node 130. Typically, in order to start the 

22 SCM module 1 1 0 or execute any SCM tools, the user may need to be added to the SCM module 1 1 0 

23 and authorized either via the GUI or the command line interface (CLI). All SCM module 110 

24 operations may be authorized based on the user's SCM authorization configuration, and/or whether or 

25 not the user has been granted SCM trusted user privilege. 

26 The SCM user may, depending upon the roles assigned, manage systems via the SCM module 

27 110. In addition, the user may examine the SCM module log, and scan the group and role 

28 configurations. When the SCM user runs a tool, the result may be an SCM task. The SCM module 



There are two general types of tools: single-system aware (SSA) tools and multi-system aware 



HP No. 10011600-1 



5 



1 110 typically assigns a task identifier for every task after it has been defined and before it is run on any 

2 target nodes 130. This identifier may be used to track the task and to look up information later about 

3 the task in an SCM central log. 

4 An SCM trusted user is an SCM user responsible for the configuration and general 

5 administration of the SCM module 110. The trusted user is typically a manager or a supervisor of a 

6 group of administrators whom a company trusts, or other trusted individual. Entrusted with the highest 

7 authority, the trusted user may do any authorization that is possible, including authorizing himself to 

8 execute any system management task with any of the nodes (machines) managed by the SCM module 

9 110. The capabilities of the trusted user include, for example, one or more of the following: creating 
_10 or modifying a user's security profile; adding, modifying or deleting a node or node group; tool 

%9 1 modification; and tool authorization. The granting of these privileges implies a trust that the user is 

CO 

%12 responsible for configuring and maintaining the overall structure of the SCM module 1 10. 

i , s 

111 3 An SCM authorization model supports the notion of assigning to users the ability to run a set 

f jj 4 of tools on a set of nodes. An authorization object is an association that links a user to a role on either 

}4 5 a node or a node group. Each role may have one or more tools and each tool may belong to one or 

« SB? 
5 _ e 

^ J4 6 more roles. When users are given the authority to perform some limited set of functionality on one or 

r y 

Q 7 more nodes, the authorization is done based upon roles and not on tools. The role allows the sum total 

|4 8 of functionality represented by all the tools to be divided into logical sets that correspond to the 

1 9 responsibilities that would be given to the various administrators. Accordingly, there are different roles 

20 that may be configured and assigned with authorization. For example, a backup administrator with a 

2 1 "backup" role may contain tools that perform backups, manage scheduled backups, view backup 

22 status, and other backup functions. On the other hand, a database administrator with a "database" role 

23 may have a different set of tools. When a user attempts to run a tool on a node, the user may need to 

24 be checked to determine if the user is authorized to fulfill a certain role on the node and if that role 

25 contains the tool. Once a user is assigned a role, the user may be given access to any newly created 

26 tools that are later added to the role . In the example given above, the backup administrator may be 

27 assigned the "backup" role for a group of systems that run a specific application. When new backup 

28 tools are created and added to the "backup" role, the backup administrator may immediately be given 
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access to the new tools on the systems. 

Figure 2 illustrates the relationships between the user 2 1 0, role 220, node 130, tool 240, and 
authorization 250 objects. User objects 2 1 0 represent users 2 1 0, role objects 220 represent roles 
220, node objects 1 30 represent nodes 130, tool objects 240 represent tools 240, and authorization 
objects 250 represent authorizations 250. However, for purposes of this application, these terms are 
used interchangeably. Each authorization object 250 links a single user object 2 1 0 to a single role 
object 220 and to a single node object 1 30 (or a node group object 132). Each role object 220 may 
correspond to one or more tool objects 240, and each tool object 240 may correspond to one or more 
role objects 220. Each user object 2 1 0 may be assigned multiple authorizations 250, as may each role 
object 220 and each node object 130. For example, Role 1 may contain Tools l-N,andUser 1 may 
be assigned Roles 1 -M by the authorization model on Node 1 . Consequently, User 1 may run Tools 
1-N on Node 1, based upon the role assigned, Role 1. 

Table 1 illustrates an example of a data structure for assigning tools 240 to different roles 220. 
Each tool 240 may correspond to a single command or application, but a single command may 
correspond to more than one tool 240 if there are other differences in how the tool 240 runs the 
command. Table 2 illustrates an example of a data structure for assigning the roles 220 to different 
users 210 on different nodes 130. 
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24 
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2 

3 Table 2 

4 

5 Although Figure 2 shows a node authorization, a similar structure exists for a node group 132 

6 authorization. The SCM authorization model may be deployed by using node group 1 32 authorizations 

7 more often than node 1 30 authorizations. This model makes adding new nodes simpler because by 

8 adding a node 1 30 to an existing group 1 32 5 any authorizations associated with the group 132 may be 

9 inherited at run- time by the node 130. 

J 0 The authorization model for determining if a user may execute a tool 240 on a set of nodes 130 

S3 . 

sB 1 may be defined by an "all or none" model. Therefore, the user 2 1 0 must have a valid authentication 

Co 

|i 2 association for each target node 1 30 to execute the tool 240. If authorization does not exist for even 

Lti 

[-13 one of the nodes 130, the tool execution fails. 

f ll 4 The SCM module 1 1 0 may also include security features to secure transactions that transmit 

~J 5 across the network. All network transactions may be digitally signed using a public or private key pair. 

6 The recipient of network transmissions may be assured of who the transmission came from and that the 

■ u 

131 7 data was not altered in the transmission. A hostile party on the network may be able to view the 

C3 

8 transactions, but may not counterfeit or alter them. 

1 9 Referring to Figure 3 , the five separate processes involved in the tool execution may include 

20 a client process, a domain manager process, a log manager process, a DTF process and an agent 

2 1 process. Tool execution may start with a request to run a tool on one or more nodes 130 from a user 

22 210 through a client 310. The client 3 1 0 is a program that interacts with the user 2 1 0 and displays 

23 information on the computer systems that reside on the nodes 130. There are two types ofclient 3 10: 

24 graphical user interface (GUI) client may be named "scmgr", and command line interface (CLI) client 

25 for executing tasks may be named "mxexec". Examples will be provided with respect to the CLI client 

26 only. A GUI client may function in a similar fashion. 

27 The client 3 1 0 may first contact the a domain manager 3 3 0 to look up user, node, and tool 

28 information and check user authorization, then log the progress with a log manager 334. The domain 

o 
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1 manager 330 is the "brain" of SCM module 1 1 0 and may be connected to the repository 1 04 for 

2 storage of the definitions of all the objects. The log manager 334 may manage a log file and take log 

3 requests from the clients 3 1 0 and write the requests to the SCM log file (described in detail later). 

4 Then, the client 3 1 0 may contact a DTF 340 to pass on the task to be executed. The DTF 340 may 

5 execute tasks by passing the task definitions and information to agents 370 running on the managed 

6 nodes 130. The DTF 340 is the "heart" of all task execution activity in that all of the execution steps 

7 must go through the DTF 340. The DTF 340 typically obtains an authorized runnable tool from the 

8 clients 3 10, distributes the tool execution across multiple nodes 1 3 0, and returns execution results to 

9 the clients 3 1 0 and to the user 2 1 0. The final process, the agent process, typically involves running the 
10 commands on the managed nodes 130. The DTF 340 may provide task manager interfaces 3 50 that 

J| 1 may be called by the clients 3 1 0 to perform a task, to cancel or kill a task, or to monitor task status. 

!? 1 2 The DTF 340 may also provide target liaison interfaces 3 60 that may be used by the agents 3 70 to 

W3 communicate with the DTF 340 in order to process assigned tasks. 

in 

|l 4 To start a task on the managed nodes 1 3 0, the DTF 340 may package up the task in a task 

FU 

B 1 5 description object, create target liaison objects 360 to track the target nodes 1 30, and pass them both 

H 6 to the agents 370 on the target nodes 130. The task description object may include task information 

W 7 received from the user, such as the name of the tool to be run, the location of the tool, the nodes on 

O 8 which to run the tool, and required arguments of the tool, if any. The task description obj ect may be 

" 1 9 serializable, so it may be shipped over the remote call in its entirety. But the target liaison 360 is 

20 typically a remote object and so only a remote reference to it may be shipped over with the remote call. 

21 An important part of the task description is the task identifier described above, which may be 

22 a unique string value. It may be based upon a 32-bit integer value that will not repeat in over 60 years 

23 assuming one new task is created each second. 




_Fig ur e4 is aflowphnrtr>fnm ^ '"1 ^y^M t in^ i ni i K9d0iimnu .i n rnmnnngnHnorl Ps 

A«i ^ 

J10^nthe_S OU-ffl tuhi1e 110. Tl i is inethodmav be implemented, for exam ple, in soft ware modotesfef 



26 S xceution by pr ocessor 1 08. First, the SCM module 110 may r ec eive a request from a u s e r 2 1 0 to> 

27 rus i n tnnl nn nnp or more nodes 1 ^ 0 th r o u g h the c lient process, step 402. The requestj ria^Rektde 

28 task info rn i< ili o n, .su ch ar> the name of the tool to be run, the location of the tool, t hexto des on whj chlQ* 



HP No. 1 00 11 600- 1 



1 ' t ' 

# # 



1 _ th^to^ l j n nH required nrgiim fl nts of th e tnnl. if anv. Next. theS C^m gd ule 1 1 0 mav -F etrieve to oL, 

2 definitio n.-node^efinit ^^ the domainm anager 330, step 404, and validate the 

3 task information received from the user210 i j>tep^ to the 

4 reposito ry l^ v may-be^ontacted^ p^ide tool definition or information about the nodes 1 3 0 or the 

5 user 2 1 0 whenever the clients 3 1 0 needled . An example of 

6 tool definition is described inUnitec^^ 

7 entitled— Service Control Manager Tool Definition", and filed on the same day herewith, which is 

8 incorporated herein by refen^ 

9 Tli^^P^ nnH w ^juhis^^ flr.tiipilty exists, and whether the requ ired 
10 — argumente-of-the-tool-are give n: — 

After the request is validated, the SCM module 1 1 0 may create a runnable tool obj ect based 
on the task information and the tool definition, step 408 . The runnable tool object may encapsulate the 
tool 240, the task information received from the user 210, and information that may be picked up from 
the environment, such as the user's name. 

Then the SCM module 1 1 0 may need to check whether the user 2 1 0 is authorized to run the 
tool 240 on all of the nodes 130 requested, i.e., whether the user 2 1 0 is assigned one or more of the 
roles 220 associated with the tool 240 on all of the nodes 130. For example, if a user 210 requests 
to run a tool 240 on two nodes 130, and the user 2 1 0 is only authorized to run the tool on one node 
1 30 but not the other, the SCM module 1 1 0 will not run the tool 240 on either node, due to the "all or 
none" authorization model. This user authorization checking may be done by a security manager 332, 
which may be a subsection of the domain manager 330, step 410. 

Once the security manager 332 has made the determination that the user 2 1 0 is authorized to 
run the tool 240 on all of the nodes 1 30 requested, the security manager 332 may return the information 
back to the client 3 1 0, and the client 3 1 0 may pass the runnable tool to the DTF 340, step 41 2. The 
DTF 340 may then issue a task identifier based on the runnable tool, step 4 1 4, and passes the runnable 
tool to the agents 370 associated with the nodes 130 to run the tool 240 using POSIX standard 
interfaces, step 4 1 6. POSIX is an IEEE standard, and, as examples, the HP-UX program is compliant 
with POSIX. The processes that can be run on a POSIX compliant system may have access to a 
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1 standard output that prints regular output, and a standard error output that prints error messages. A 

2 standard input is how a POSIX process would read input from a user or a file. The POSIX model 

3 masks input/output (I/O) operations and makes them look like file operations, reading input from a file 

4 on the file system and writing output to a file. Thus standard input, standard output and standard error 

5 are three standardized files, and when running a command or program in a POSIX compliant operating 

6 system, a user 210 may specify and control what is attached to those three files. 

7 The task manager interface 350 may use running tool obj ects to perform the tasks, one per 

8 task. The DTF 340 may have a hash table that contains references to all the running tool objects that 

9 are active. The hash table is a common data structure for providing fast indexing of information by 
£ 1 0 providing an algorithm that computes some type of address based on a hash key. The hash key for the 

tM 1 hash table may be the task identifier, a string value generated by the DTF 340 based on the runnable 

U 

|J2 tool that may be guaranteed to be unique. 

in 

tA 3 When the running tool completes its task, the DTF 340 may create a completed task object 

5 4 4 to contain the final results, and dereference the running tool because the running tool is no longer 

5 needed. The completed task object may be a container of status objects. The DTF 340 may have a 

f 11 6 hash table that contains references to all the completed task objects, including the status information, 
f 1 7 The status objects may include an overall task status object and individual target status objects. 

1 8 The overall task status object may include a task state indicator that reports whether the task is 

1 9 completed, failed or cancelled. The references to the runnable tool may be included so that a client that 

20 did not invoke the task may look up the definition of the task that was performed. The task state 

21 indicator may have one of the values as shown in Table 3: 
22 



L_t. 



23 
24 



Value of task state indicator 



Meaning 



MX TASK PENDING 



The task does not have sufficient resources in the DTF y^t 
to run and so it is waiting. No targets have been 
contacted. 



25 



MX TASK RUNNING 



The task is now running. 
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MX_ 


_TASK_ 


COMPLETE 


The task is complete and it did not fail. 




MX_ 


TASK. 


FAILED 


The task is complete and it failed before any target was 
contacted or on all targets. 




MX_ 


TASK 


_SOME_FAILURES 


The task is complete and it failed on some targets while r 
failing on others. 


lOt 


MX 


_TASK_ 


.CANCELLED 


The task was cancelled before it could complete on all 
specified targets. It might have failed on some targets an 
completed with no failures on others. 


d 



Table 3 



The individual target status obj ects may report, for example, whether or not the connection to 
the node is completed, and whether the execution of the tool on the node is successful. The target 
status object may contain a target state indicator, a number of files copied count, a failure cause 
indicator, an exit code value, and a reference to a target output obj ect . The target state indicator may 
take on the values as shown in Table 4: 



Value of target state indicator 


Meaning 




MX_TARGET_PENDING 


The target has not yet been contacted because resources are 
not available in the DTF to start it. 




MX_TARGET_COPYING 


The tool has files that need to be copied to the target and 
those files are currently being copied. 




MX_TARGET__RUNNING 


The command associated with the tool is now being execute 
on the target. 


d 
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MX_ 


TARGET 


COMPLETE 


The task has completed on the target and it did not fail. Thi 
is the only state in which the target status object contains a 
valid exit code value and a valid reference to a target output 
object that contains the resulting output from the execution 
the command associated with the tool. 


5 


MX_ 


TARGET 


FAILED 


The task has completed on the target and it failed. The failv 
cause indicator contains a value that indicates the cause oft 
failure. 


re 
le 


MX 


TARGET 


CANCELLED 


The task was cancelled on the target. The command 
associated with the tool was never executed. 




MX. 


TARGET, 


KILLED 


The command associated with the tool was running and was 
killed before it could complete. 





s 6 Table 4 

w 

8 If the target state indicator is MXTARGETCOMPLETE, the target status object may 

3 : 
■aw 

P 9 contain a valid value for the command exit code and a valid reference to a target output object, which 

1 0 may contain the exit code, standard output (stdout) and standard error output (stderr) that resulted from 

1 1 running the command associated with the tool 240 on the target node 130. The agent typically returns 

12 the exit code, instead of trying to interpret it, which may lead to conflicting results. 

1 3 The status objects, the target output object and the runnable tool object are all serializable for 

1 4 transport to and from the DTF 340 via remote calls. Using remote calls to the DTF 340, the clients 

15 310 may access these status and output objects and use them to display task and target status to the 

16 user 210. 

1 7 After the DTF 340 passes the runnable tool to the agents 370 associated with the nodes 130, 

1 8 the agents 370 may execute the tool 240, step 4 1 8, and collect the target output, including the exit 

19 code, the stdout, and the stderr, step 420. Next, the DTF 260 may collect task results or failure 



HP No. 1001 1600-1 



# • 



1 reports from the agents 370 for each node 1 30, step 422, and update each individual target status, step 

2 424. 

3 After all target nodes have completed the execution, the DTF 260 may update the overall task 

4 status, step 426. The target liaisons 260 typically keep track of the individual target status by 

5 communicating with the agents 370 running on each ofthe target nodes 130. When all of the running 

6 tasks reach the final stage, whether completed, failed or cancelled, the DTF 260 may return the task 

7 results or failure reports to the clients 3 1 0 and then to the user 210, step 428 . The user 2 1 0 may 

8 monitor and review the task results by displaying on a computer screen, step 432, printing on a printer, 

9 step 434, writing to a file, step 43 6, or writing to a directory of files that contains one file for each node 
£3 0 130 requested, step 43 8 . 



frj 1 Tool execution may involve copying files and/or running commands and programs. If there are 

to. 

h} 2 files to be copied from the CMS 1 00 to the nodes 1 3 0, the DTF 340 typically opens the files on the 

-5 3 CMS 1 00 and reads the contents before contacting any of the multiple target nodes 1 3 0, so that errors 

H 4 may be detected before the target nodes 1 3 0 are contacted. If the files cannot be read, the DTF 340 

□ 5 may start a failure process, and return a failure status to the user 210. 

fT\ 6 The DTF 340 may be multi-threaded in that it may accept multiple, simultaneous requests and 

if"! 

Jfl7 may simultaneously perform multiple tasks on multiple managed nodes 130. There may be limits on the 

H 8 number of tasks that may be in process at one time and on the total number of node connections that 

19 may be active so as not to overwhelm the resources of the SCM module 110. 

20 First, there may be a limit on the maximum number of simultaneous task executions that may 

2 1 be enforced by the DTF 340, in order to limit the resource consumption on the server. For example, 

22 if the limit is ten tasks at a time, and the DTF 340 tries to run the eleventh task when there are already 

23 ten tasks running, the eleventh task will wait until one of the ten finishes. 

24 There may also be a limitation on the maximum number of nodes 1 30 with which the DTF 340 

25 may communicate at a time for all of the tasks. For example, if the limit is sixteen, and a task needs to 

26 be run on sixty-five different nodes 1 30, then only sixteen nodes 1 3 0 will be contacted by the DTF 340, 

27 and the rest will wait until one or more of the sixteen complete the task, so that there will only be sixteen 
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1 nodes 130 running at a time. The purpose is again for the control of memory resources so that the 

2 CMS 100 will not be overwhelmed by a large amount of requests at the same time. 

3 Task execution is achieved through communication and interaction between the agents 3 70 and 

4 the target liaisons objects 260 on the CMS 1 00. The target liaison objects 360 may be created by the 

5 DTF 340 to keep track of the corresponding target nodes 130 and establish a one-on-one 

6 communication between the target liaisons 260 on the CMS 1 00 and the agents 370 running on the 

7 target nodes 1 30. To create the target liaison object 260, the DTF 340 may initialize the target liaison 

8 object 260 using the passed in arguments that include the task identifier, the hostname of the target with 

9 which it communicates, the number of files to be copied, and a reference to the running tool. Next, the 
C3 0 DTF 340 may contact the agents 3 70 running on the target nodes 1 30 via the RMI registries on the 
[| 1 nodes 1 3 0 (described later). The DTF 340 may pass the remote reference, the task definition, and a 

1,4, 

y\ 2 digital signature of the passed arguments to the agents 370 associated with the nodes 130. Then the 

f :3 3 execution of the task on the target nodes 1 3 0 is in the control of the agents 370 running on the nodes 

rll4 130. 

[3 5 The SCM agents 370 may be the software component that are installed on all the managed 

ii\6 nodes 130 in an SCM cluster that performs tasks on the nodes 130onbehalfoftheDTF340. The 

id 7 agents 370 typically communicate with the DTF via Java Remote Method Invocation (RMI) calls and 

H 8 register singleton objects with the Java RMI registries running on the nodes. Java RMI is a distributed 

1 9 object model for the Java Platform and extends the Java object model beyond a single virtual machine 

20 address space, so that executable code can be dynamically distributed on demand, including all 

2 1 necessary code for distributed applications. The term "Java" is a trademark of Sun Microsystems, Inc. 

22 The execution of the task on the target nodes 1 3 0 may start with the agents 3 70 unpacking the 

23 task information and the tool definition encapsulated within the runnable tool. The agents 370 may be 

24 connected with the corresponding target liaison object 260 at the CMS 1 00, and therefore may report 

25 any changes, for example, a cancellation, quickly back to the DTF 340. 

26 The agents 370 running on the managed nodes 130 may need to execute tasks with the 

27 minimum amount of invasion, i.e., use the least amount of resource, because the managed nodes may 
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1 be web servers or database servers that have other important tasks. Therefore there may be a limit on 

2 the number of simultaneous tasks that can be performed by the agents 370. When a remote call is 

3 made to run a tool 240 on a target node 1 30, the agent 370 may check to see if there is a tool runner 

4 obj ect in the free list. If there is, the agent 3 70 may remove the tool runner from the free list, initialize 

5 it, and then, using the task identifier as the key, add it to the active runner list. Next a thread may be 

6 created and passed to the tool runner. The task has now been launched with the tool runner doing most 

7 of the work. On the other hand, if there are no free tool runners, i.e., when the task capacity of the 

8 agent 370 is reached, any subsequent attempts to start new tasks on the agent 3 70 may result in an 

9 exception back to the DTF 340. The DTF 340 may attempt to run the task on any other pending target 
^|0 nodes 130 before retrying with the target node 130 that is at its limit. This may allow the task to 
Cfll continue on other nodes 130thatmaybe less loaded. Ifthere are no other target nodes 130on which 
I J 2 to run the task, the DTF 340 may wait a small time, for example, a second, and retry starting the task 
ffi 3 again. This may continue until the target node 130 completes another task and accepts the new one 

4 or until the user 210 cancels the task. After the tool runner completes the task, the agent 3 70 may 

y 5 remove the tool runner from the active list and place it on the free list. 

ill 

1 Jf 6 These limitations, i.e. , task limit, node limit and agent limit, may all be customized by the user 

^17 210 depending upon the resources available. 

m 

1 8 An agent status obj ect, parallel to the target status obj ect, may be used to report the status of 

1 9 the task running on the individual nodes 130. The initial value of the agent status object may be 

20 MX_AGENT_TR_PENDING. After a call is made to run a tool 240 on the node 130, the agent 370 

2 1 running on the node 1 3 0 may first check to see if the tool 240 specifies any files to be copied. If so, 

22 the tool runner may update the agent status value to MX AGENT TR COP YING and then copy the 

23 files into place. Errors that result from copying files may result in a final agent status value of 

24 MX_AGENT_TR_F AILED or MX_AGENT_TR_C ANCELLED and a failure may be reported. 

25 If there are no files to copy, or after all such files have been copied, the runner may check the 

26 kill request flag to see if a kill task call has occurred in another thread. If so, the runner may update the 

27 agent status value to MX_AGENT_TR_KILLED and report a failure. If not, the runner may update 
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1 the agent status value to MX_AGENT_TR_RUNNING and continue. The tool runner may then run 

2 the commands associated with the tool 240 in a separate process and gather up the exit code, stdout 

3 and stderr. 

4 An integral part of the SCM functionality may be the ability to record and maintain a history of 

5 events, by logging both SCM configuration changes and task execution events through the log manager 

6 3 34. SCM configuration changes may include adding, modifying and deleting users and nodes in the 

7 SCM module 110, and creating, modifying and deleting node groups 132 and tools 240. Task 

8 execution events may include details and intermediate events associated with the running of a tool 240. 

9 The details may include the identity of the user 210 who launched the task, the task identifier, the task 
% 0 start time, the actual tool and command line with arguments, and the list of target nodes 1 30. The 
ft 1 intermediate events may include the beginning of a task on a managed node 130, and exceptions that 
W 2 occur in attempting to run a tool 240 on a node 130, and the final result, if any, of the task. The exit 

in 

£fl 3 code, stdout and stderr, if they exist, may also be logged. 

ru 

» 1 4 While the present invention has been described in connection with an exemplary embodiment, 

[3 5 it will be understood that many modifications will be readily apparent to those skilled in the art, and this 

f 3 6 application is intended to cover any variations thereof. 

1^ 
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